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Abstract 

The  straightforward  application  of  a  sequential  estimator  to 
nonlinear  regression  (curve  fitting)  prcblsns  is  generally  not  possible  when 

good  a  priori  parameter  estimates  are  not  available  and  also  when 
minimizing  the  error  over  a  local  portion  of  the  data  does  not  in¬ 
sure  that  it  is  minimized  globally.  However,  a  sequential  estimator 
may  be  easily  utilized  to  perform  off-line  processing  in  such  a  si'' 
uation.  The  key  is  to  process  the  measurements  in  a  random  order 
rather  than  the  causal  order  in  which  they  occur. 

The  off-line  use  of  an  extended  Kalman  filter  is  illustrated 
in  terms  of  a  particular  application.  This  technique  is  essentially 
a  sequential  version  of  the  Gauss— Newton  minimization  procedure 
with  relinearization  being  performed  after  each  measurement  is  pro¬ 
cessed.  Ficticious  measurement  noise  is  necessary  to  prevent  filter 
divergence  and  is  included  in  a  very  simple  manner. 

Computational  savings  over  more  conventional  iterative  minimi¬ 
zation  techniques  are  possible  if  the  functions  and  partial  deriva¬ 
tives  involved  are  suf f icier  iy  complex  to  evaluate.  But  there  is 
a  real  question  regarding  t  «  extent  to  which  convergence  can  be 
assured  .  The  results  of  sin.. nations  are  presented.. 
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Chapter  1 

The  Problem  Statement 


We  will  restrict  our  attention  to  nonlinear  models  of  the 


form 


y(flk)  -  h(ek)  +  Vk  ,  k«l,2...N 

/\  X 

h  <  V  ■ 


where 


i)  f(0)  is  a  known  continuous  function  that  is  even  and 
monotonically  decreasing  in  6 . 

ii)  lim  8rwlf(e)«=0 

I  eh- 

iii)  6  is  a  function  of  time  and  some  parameters  whose  values 
are  unknown.  The  c's  are  also  unknown. 

iv)  n  is  known. 

v)  The  number  of  data  points  available  and  the  signal-to- 
noise  ratio  are  such  that  a  curve  fitting  technique,  that 
is  estimating  parameters  by  minimizing  some  error  criteria, 
is  viable. 

For  the  purposes  of  this  paper,  it  will  also  be  assumed  that 

vi)  V^is  ■  Gaussian  random  variable  ~  n(0,a2),  e2known. 

vii)  0k  *  S(tk-T0),  where  S  and  T^  are  unknown  acale  and  loca¬ 
tion  parameters,  respectively. 

One  wishes  to  estimate  the  parameters,  ci»e2* •  •c(|,S,T  . 

This  is  a  nonlinear  (in  the  parameters)  optimization  problem. 

For  off-line  processing,  there  are  several  well-known  iterative  mini- 


mization  techniques  (XMTX  such  as  those  of  the  descent  type,  that 
are  obvious  possibilities. 

Moreover,  the  nonlinear  model  in  question  is  a  'separable'  model; 
its  parameters  can  be  separated  into  two  groups,  those  which  appear 
linearly  (clfc2»c3)  and  those  which  appear  nonlinearly  (S,TQ).  This 
class  of  nonlinear  models  is  significant  both  because  of  the  wide 
variety  of  applications  in  which  it  appears  and  because  it's  struc¬ 
ture  can  be  exploited  in  off-line  optimization  to  achieve  computa¬ 
tional  efficiency  [1] . 

He  will  first,  however,  investigate  the  suitability  of  sequen¬ 
tial  estimation  of  the  parameters.  By  ‘sequential  estimation*  is 
meant  calculating  an  updated  estimate  as  each  sampled  measurement  of 
y(6),  that  is  y(Bk),  arrives.  This  calculation  is  made  using  only 
the  previous  estimate  and  the  current  observation. 

The  reason  for  considering  sequential  estimation  is  that  in  at 
least  one  application,  the  estimation  of  ship  movement  by  means  of 
a  fixed  sensor  measuring  magnetic  field  intensity,  the  signal  can 
arrive  over  a  period  of  a  minute  or  so  with  the  spacing  between 
sampled  points  being  on  the  order  of  tenths  of  a  second.  Mot  only 
may  enough  time  be  available  for  real  time  sequential  processing  but 
it  may  also  be  desirable  to  obtain  reasonably  accurate  estimates  as 
early  as  possible. 


» 
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Chapter  2 

Some  Difficulties  With  Sequential  Processing 


The  model  of  (1)  can  be  put  in  the  Kalman  estimation  framework 
ft  with  the  state  vector,  comprised  of  the  parameters  to  be  esti¬ 
mated.  Then: 


yk  *  W  +  vk 


(2) 


ft 


» 


» 


I 


» 


» 


I 


where  there  is  no  system  noise  present. 

Nonlinear  sequential  estimators,  such  as  the  extended  Kalman 
filter,  are  most  successful  when  an  initial  reference  "trajectory", 
about  which  one  can  linearize,  is  known  to  a  high  degree  of  confi¬ 
dence.  For  this  discussion  and  for  the  latter  examples  we  will 
assume  that  initial  nonlinear  parameter  estimates  accurate  to  about 
a  factor  of  two  are  available. 

Tl>e  main  difficulty  that  is  ..countered  in  attempting  to 
quentially  ..timata  the  parameter,  of  (1)  i.  that  minimizing  the 
error  (between  the  ob.erv.tion.  and  those  predicted  u.ing  the  eeti- 
mated  parameter.)  over  a  local  section  of  the  waveform  doe.  not 
guarantee  a  good  fit  over  the  entire  waveform. 

I»  particular,  for  a  local  a.ction  of  the  waveform  meny  a.te 
of  parameter  ..timata.  win  produce  almo.t  equally  good  fit.. 

To  niu.tr.te  thie  point,  conaider  the  model  of  (1)  with 


fV  * 

s  * 


-ft 


.014 


-  200. 
Cl  «  1.0 
C2  -  1.0 
C3  -  -.1 
-4 


10' 
'00 


(3) 
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FigTlT  is  a  plot  of  yO^). 

For  different  values  of  (S.Tq),  the  least  square  estimate  of 

(c^,c2tC3)  were  calculated  using  the  noise  corrupted  waveform  of 

Pig.  1.  The  sum  of  the  square  errors  between  the  y  (1^)  and  the 

a  a  a  a  A 

model  using  each  such  set  of  (S(T^(c^«C2»c^)  was  also  calculated. 
Because  of  the  least  square  error  criterion,  the  linear  parameter 
estimates.  ®x*^2*^3*  *re  *  function  of  the  chosen  Thus  a 

A  A 

three-dimensional  plot  of  the  error  versus  various  values  of  (S,Tg) 
is  informative. 

Moreover.  Go].ub  has  established  that  there  is  a  direct  relation¬ 
ship  between  the  extrema  in  this  plot  and  those  in  the  space  of  all 
the  parameters. 


The  theorem,  which  is  proven  in  Golub  [1] ,  deals  with  square 

error  optimization  applied  to  separable  models.  One  can  define  two 

2  A 

error  functions.  The  first  e^  (a ) ,  is  the  error  as  a  function  of 
both  linear  and  nonlinear  parameters.  Now  let  21  be  partioned  into 
linear  and  nonlinear  parameters,  2LT“l2S£jir*NL^T*  The  second  error 
function,  e2  *  utilizes  the  fact  that  for  a  given  nonlinear  para¬ 
meter  estimate,  2^,  the  least  square  estimate  of  the  linear  para¬ 
meters,  2£  is  unique.  Since  the  linear  parameter  estimates  are 

then  a  function  of  the  nonlinear  parameter  estimates,  *2  $tJL^  • 
functioned!  only  the  nonlinear  parameters. 

Por  our  purposes,  the  theorem  is  relevant  as  it  establishes 
that  for  a  small  enough  neighborhood  in  the  nonlinear  parameter 
space,  ft, where  the  matrix  of  basis  functions  has  constant  rank  (see 
Appendix  B)  the  following  are  true: 
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i)  if  i»  a  critical  point  (or  global  minimizer  in  0)  of  e*  * 
then  (&lin  LS'Sl^1*  alao  a  critieal  P©int  <or  9lobal  winimizer 
for  SljjL* )  of  e^  (fe) . 

ii)  if  *($£in  a  global  minimizer  in  e  J  (&)  for  then 

is  a  global  minimizer  of  1x1  °* 

Practically,  (i)  implies  that  a  local  minimum  found  in  one  of  the 
previous  error  surfaces,  which  are'  functions  only  of  the  nonlinear 
,  irameters,  would  in  fact  be  a  minimum  in  the  space  of  all  the  para¬ 
meters.  Thi6  is  the  apace  in  which  the  RSKF ,  as  well  as  iterative 

minimization  techniques,  would  operate. 

In  particular,  several  plots  were  made.  Figs.  2-5,  using  the 

first  100,200,300  and  400  sampled  values  of  the  waveform,  respec- 

N  2 

tively.  The  error  was  normalized  by  £  y  (8.  )  and  -loglr.  of  this 

Xs*  1  K  i0 

error  was  actually  plotted  so  that  the  z  axis  is  a  logarithmic 
scale  and  a  peak  corresponds  to  an  error  minimum. 

As  one  might  expect,  with  only  the  first  100  sampled  values 
accesible  (Fig.  2),  the  error  surface  is  quite  flat.  There  is  no 
clearly  optimum  set  of  (S,T^).  As  more  data  is  available  (Figs. 
3-5),  the  optimal  (S,Tq)  does  indeed  develop  at  about  (S*=.ol4, 
Tq*200),  while  local  minima  decrease  in  significance. 


A  similar  aet  of  plots  (Figs.  6-9)  were  made  using 
ff®^)  *  *  Thi®  function  arises  in  a  particular  applica¬ 

tion  that  is  explained  in  some  detail  in  Chapter  4.  The  parameters 
of  (25)  were  used  in  generating  these  plots.  With  only  100  sampled 
values  available,  once  again  the  error  surface  is  rather  flat. 

This  situation  especially  mitigates  against  the  use  of  esti¬ 
mators  that  implicitly  assume  that  the  current  parameter  estimate 
adequately  summarizes  the  information  concerning  the  parameters 
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The  problem  with  a  sequential  estimator  that  processes  the 

>  data  in  its  causal  order  is  that  during  the  early  sections  of  the 
waveform  there  are  many  parameter  values  that  provide  locally  al¬ 
most  equally  good  fits,  lhen,  when  the  filter  is  processing  data 

I  in  the  latter  section  of  the  waveform  it  again  has  no  information 
available  to  it  concerning  the  fit  in  the  other  (earlier)  parts  of 
the  waveform. 

9  Recursive  least  square  estimation  of  only  linear  parameter  models 

does  not  suffer  from  this  problem  as  the  estimate  at  any  time,  t^, 
is  the  least  square  estimate  for  all  measurements  received  up  till 

*  t^.  Nonlinear  estimators  generally  do  not  enjoy  this  property. 

An  extended  Kalman  filter,  for  instance,  maintains  an  error  co- 
variance  matrix,  whose  trace  is  non-increasing  -  (in  the  absence 

*  of  system  noise).  Practically  this  means  that  a  good  fit  during  the 
early  part  of  the  waveform  leads  to  unrealistically  low  errow-covariance 
estimates.  These  represent  such  a  high  degree  of  assumed  confidence 

*  i.  the  current  estimate  that  when  more  recent  measurements  show  a 
lack  of  fit,  through  the  growth  of  the  residuals  (innovations),  the 
gain  is  set  small  enough  to  ignore  them  and  the  filter  ultimately 

*  diverges . 

Before  continuing  it  should  be  noted  that  the  unsuitability  of 
sequential  estimation  has  been  discussed  only  for  a  particular  type 

*  of  model  and  only  within  the  limits  of  our  earlier  definition  of 
'sequential  estimation'.  Tenney  et.al.f2],  for  instance,  obtained 
good  results  for  a  somewhat  related  model  through  a  linearized  ver- 

*  sion  of  the  model  equations  that  depended  only  on  a  single  nonlinear 
parameter,  thus  allowing  the  use  of  parallel  filters.  Also,  a  real 
time  estimator  is  possible  if  it  stored  all  or  some  representative 


« 
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portion  of  the  data  received  up  till  time  t^  and  ita  estimate  de¬ 
pended  only  on  this  data,  not  the  previous  estimate.  The  diffi¬ 
culty  here,  though,  is  computational. 

While  a  simple  sequential  estimator  may  be  inappropriate  for 
real  time  processing,  the  next  section  will  show  that  it  is  possible 
to  effectively  utilize  it  in  an  off-line  manner. 


< 


Chapter  3 


Off-Line  Pse_ef  a  Sequential  Estimator 
3.1  Introduction 

I  The  nain  impediment  to  the  on-line  implementation  of  a  sequen¬ 

tial  estimator,  in  the  context  of  the  model  (1),  la  that  for  the 
most  recent  estimate  of  the  parameters,  the  error  is  only  evaluated 
I  over  a  local  section  of  the  waveform.  While  one  would  like  to 
evaluate  the  error  over  the  entire  waveform,  this  can  only  be  done 
once  the  data  has  been  completely  received,  that  is,  off-line.  Since 
I  off-line  processing  may  be  justifiable  in  some  applications,  it  will 
now  be  considered. 

Given  samples  from  the  entire  waveform,  how  can  a  sequential 
»  estimator,  such  as  the  extended  Kalman  filter,  perform  off-line  pro¬ 
ving?  The  simplest  way  would  be  to  process  the  measurements  not 
in  their  causal  order  (y^,y2fy3< • • )  but  in  some  random  order  (y37» 

I  y205'y80* *  * ^ *  *n  °bvious  possibility  is  equiprobable  sampling 
without  replacement.  The  advantage  of  such  a  random  sampling  esti¬ 
mator  is  that  over  a  number  of  iterations  the  filter  obtains  a  mea- 
t  sure  of  the  error  over  the  entire  waveform. 

Again,  the  three  dimensional  error  plots  of  Chapter  2  can  be 

used  for  illustrative  purposes.  Figs.  10-13  were  generated  with 
-e? 

*  ff*^)  ■  •  *  and  the  parameters  of  (3).  However,  the  measure¬ 
ments  were  randomly  aarapled  (without  replacement).  Thus  Fig.  10  is 
based  on  the  first  10  randomly  selected  measurements.  Fig.  11  uses 

*  an  additional  30  measurements  for  a  total  of  40  randomly  selected 
measurements  and  so  on. 

What  these  plots  indicate  is  that  in  contrast  to  Fig.  2,  even 

*  with  10  random  measurasMnte ,  there  is  a  clearly  global  optimum. 


mm 
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RANDOMLY  SELECTED  SAMPLES 


Figure  13 
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Additional  measurements,  of  course,  improve  its  "visibility". 

Henceforth,  an  extended  Kalman  filter  which  processes  pre¬ 
recorded  measurements  in  a  random  order  will  be  referred  to  as  a 
random  sampling  Kalman  filter  (RSKF). 


3.2  Relation  to  the  Gauss-Newton  Technique 
There  is  a  close  relationship  between  the  extended  Kalman  filter 


for  (2)  and  the  iterative  and  off-line  (batch)  Gauss-Newton  optimiza¬ 
tion  technique.  Both  express  the  measurement  nonlinearity  as  a 
first  order  Taylor  series  expansion  so  as  to  obtain  equations  that 
are  linear  in  the  state/parameter  deviation  (dx^«x^ , j-x^) .  Linear 
solution  methods  then  can  be  applied. 

It  is  well  known  that  if  an  extended  Kalman  filter  does  not 
relinearize  (continue  using  the  original  £  estimate  for  generating 
basis  functions  and  partial  derivatives),  after  each  measurement 
is  processed,  it  can  be  made  to  produce  results  identical  to  those 
of  a  single  Gauss-Newton  iteration  [8] .  That  iB.  without  relinear¬ 
ization.  an  extended  Kalman  filter  is  a  sequential  version  of  the 
Gauss-Newton  technique.  It  follows  that  the  RSKF  described  previously 
can  be  thought  of  as  a  sequential  version  of  a  Gauss-Newton  iteration 
where  relinearization  is  performed  after  each  measurement  is  process¬ 
ed. 


While  such  relinearization  offers  the  potential  for  faster  con¬ 
vergence  [8] ,  there  are  often  divergence  problems  associated  with 
extended  Kalman  filters.  Furthermore,  for  certain  models  the 
filter  may  converge  to  points  other  than  the  convergence  points  of 
the  off-line  fcatch)  technique  f6,l2].  Steps  to  mitigate  against 
the  former  difficulty  are  discussed  in  sec.  3.4.  The  remainder 
of  this  section  is  devoted  to  a  more  detailed  exposition  of  the 
previous  points. 
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In  the  Gauss-Newton  approach  one  starts  with  N  measurement 


equations  modeled  as 


X  -  I  <*>tL 


(4) 


where  each  term  is  a  Nxl  column  vector  and  the  statistical  properties 
of  £  are  assumed  to  justify  a  least  squares  fit.  Expanding  the 
nonlinearity  one  has: 

I  **  (5) 


♦  L 


The  least  squares  estimate  of  the  deviation, 
follows  from: 

lx-I<L>  --=£*- 12*  II2 


-inllX  -  i<i*>  - 

**  *k 


^Jt+1" 

_T  _  \-l  _T 

d&  (&)  (&)  \  djJt  (2t  > 

*2L  *2 


X  - 


(6) 


(7) 


It  should  be  noted  that  in  practical  use  the  correction  term, 

A 

**k'  11  multiplied  by  a  scalar  which  is  optimised  during  each 
iteration  [3].  We  are  not  considering  this. 

As  was  mentioned  above,  if  the  &  used  for  generating  the 
measurement  function  matrix,  £(&),  and  its  partial  derivatives  is 
held  fixed  (no  relinearisation) ,  then  above  can  be  arrived  at 
through  a  set  of  sequential  estimator  equations.  These  are  de¬ 
rived  in  the  same  way  that  the  recursive  least  squares  algorithm 
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can  be  derived  from  the  off-line  (batch)  least  squares  solution 
(13].  One  obtains: 

^k+l“  K  +  ^k,iIyi"hi  ^k^^i 

is;  <a*>  ^aiT(^c)*£ii-1 

4*1-  <V>4 


where  h^  (jt)  is  the  measurement  function  from  (2)  and  H^(x)  is 
the  nxl  vector  of  partial  derivatives  of  h^  Q&) .  The  subscripts 
on  the  gain  expression,  k  and  i,  represent  the  iteration  and  ran¬ 
domly  chosen  measurement,  respectively.  is  the  measurement  noise 
covariance  matrix  and  is  the  error  covariance  matrix. 

The  ^k+1  of  ^  can  be  lnade  arbitrarily  close  to  the  £  re¬ 
sulting  from  a  single  iteration  of  (7)  on  the  same  k+1  measurements 
that  the  sequential  estimator  (8)  has  processed  if  the  sequential 
estimator  is  properly  initialized.  This  is  usually  accomplished  by 
setting  the  initial  covariance  matrix,  PQ,  to  a  diagonal  matrix 
with  arbitrarily  large  diagonal  terms  (13,14).  While  in  linear 
sequential  estimators  £Q  i*  often  set  at  zero,  this  is  not  appro¬ 
priate  in  the  nonlinear  case  where  the  measurement  function  partial 
derivatives  are  functions  of  &•  The  initial  estimate  of  jc  is  in¬ 
stead  used. 

The  above  sequential  estimator  equations  correspond  to  the 
"linearized”  filter  of  Gelb  (15]  for  the  model  (2).  That  is, 

i»  the  reference  estimate  (trajectory)  and  the  filter  equations 


<7-i 
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are  linearized  about  it.  If  the  aquationa  are  relinearised  after 
1  each  measurement  (j >  *  the  tern  in  the  equation  for 

1*+!  drops  out  and  (8)  repreamt  a  the  extended  Xalnan  filter 


equationa  for  (2 ) . 
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3.3.  The  Correction  Vectors  dx^ 

Bow  do  the  correction  vectors,  dx^>  generated  by  the  RSKF 
compare  to  those  produced  by  one  of  the  standard  iterative  descent 
techniques  ? 

In  an  iterative  descent  technique,  the  vector  of  all  para¬ 
meters,  at  the  k+1  iteration,  is: 

4+1  -  4  -  •1<4>  VE  <^1  (9! 

where  VEfx^)  is  the  gradient  of  the  error  function,  is  a 

matrix  depending  on  the  particular  descent  technique  employed  and 
the  scalar  a  is  chosen  small  enough  to  assure  convergence.  For  a 
square  error  criterion. 


ve^)  -  v  ^  lyj-h^n1 


=  -2 


where  Hj (&k)  is  the  matrix  of  partial  derivatives  of  h^  with 
respect  to  the  parameters. 

Thus  the  iterative  descent  technique  is 

If 

4+1  S  4  +  J*i4>  j*lBjT<V(Yj-hj  (4>>  1 


while  the  RSKP  is 


4+i  *  4  +  4[4(4)44T4,+4rlai’r'4’I*i-ht(4)' 

4+i  -  <1-4.14 


a 
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»  In  the  RSKF  estimation  equation,  (12),  the  magnitude  of  the 

correction  vector,  i«  invereely  proportional  to  a  measure 

of  the  uncertainty  due  to  both  the  measurement  noise  and  the  un- 

*  certainty  in  the  current  estimate.  The  matrix  weights  the  cor¬ 
rection  vector  components  so  that  directions  corresponding  to 
greater  uncertainty  are  favored.  Most  importantly,  the  vector 

t  summation  operation  of  'improvement'  vectors,  11^  (^)»  weighted  by 
the  pointwise  errors,  (y^-l^^)),  has  been  replaced  in  the  RSKF 
by  only  a  single  such  term. 

*  What  can  be  said  about  this  approximation  as  it  affects  the 
RSKF?  By  itself,  minimization  of  the  error  at  a  single  point 
(given  more  than  one  linear  parameter)  iB  an  undetermined  problem. 

*  The  intention  is  that  over  a  number  of  iterations,  that  is  on 
average,  the  direction  of  the  correction  vectors  will  be  towards 
..ecreasing  error,  with  respect  to  the  error  surface  of  all  the 

*  available  measurements. 

At  each  iteration  of  the  RSKF  the  single  measurement  that  is 
processed  leads  to  a  correction  vector.  Depending  on  which  measure- 

*  roent  is  randomly  chosen  for  processing,  different  correction  vectors 
will  result.  It  might  initially  be  believed  that  they  would  all  be 
roughly  of  the  same  direction  and  magnitude  but  this  is  not 

*  necessarily  true.  Individual  iterations  of  the  RSKF  may  produce 
estimates  worse  than  the  previous  estimates  in  spite  of  a  trend  of 
reduced  error  over  a  number  of  iterations. 

8  One  might  also  suppose  that  at  least  close  to  the  error  mini¬ 

mum,  the  possible  correction  vectors  would  have  similar  characteris- 
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tics.  While  this  is  true  to  some  extent,  very  close  to  the  error 
minimum  it  becomes  important  to  appreciate  the  effect  of  the  mea¬ 
surement  noise  in  the  RSKF  equations. 

There  are  two  sources  accounting  for  the  pointwise  errors: 

yl“hl^k*  '  y2"h2^k)# . yN~hN^k* 


One  is  the  measurement  noise.  Even  for  the  optimal  the 
differences  are  non-zero.  This  can  be  thought  of  as  a  random  error. 
The  second  is  due  to  using  an  estimate  of  the  parameters  that  is  not 
the  optimal  estimate.  This  can  be  thought  of  as  a  systematic  error. 
Such  concepts  are  discussed  in  the  regression  and  filtering  litera¬ 
ture  fll  1  . 

If  ones'  estimates  are  far  from  the  optimal  parameter  estimates 
this  second  source  of  error  dominates.  But  close  to  the  optimal 
parameter  estimates  the  measurement  noise  is  more  appreciable.  As 
the  optimum  is  approached,  the  pointwise  errors  above  take  on  the 
statistical  characteristics  of  the  measurement  noise.  That  is,  they 
are  approximately  zero  mean  Gaussian  random  variables  of  variance 
,2. 

Since  the  correction  vectors  are  proportional  to  the  pointwise 
errors,  close  to  the  optimum  they  will  be  pointing  180°  awsy  from 
the  direction  they  would  have,  had  there  been  no  measurement  noise, 
almost  50X  of  the  time  (See  Appendix  A) .  Also  the  correction 
vectors'  magnitude  becomes  more  dependent  on  the  measurement  noise. 
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This  in  itself  is  not  fatal  to  the  RSKF's  operation.  The 

» 

sane  sort  of  statement  could  be  made  concerning  clearly  optimal 
linear  estimation  procedures.  It  is  mentioned  to  shew  the  effect 
of  the  measurement  noise  on  the  RSKF's  correction  vectors. 

Finally,  with  a  qualification,  the  RSKF  is  as  likely 
to  converge  upon  a  local  minimum  as  any  iterative  descent 
technique.  The  qualification  is  that  because  the  RSKF 

» 

does  not  move  strictly  in  the  direction  of  decreasing  error, 
it  conceivably  could  leave  the  region  of  a  local  minimum. 

Conceivably  too,  an  unfortunate  choice  of  correction  vector 

(I 

could  place  the  estimates  close  to  a  local  minimum. 


3.4  the  RSKF  with  Ficticious  Measurements  Noise 

I 

In  Chapter  2  it  was  pointed  out  that  if  the  measurements  are 
causally  processed  through  on  extended  Kalman  filter,  divergence 
will  occur  simply  because  many  values  of  parameter  estimates  provide 

» 

almost  equally  good  fits.  While  random  sampling  eliminates  this 
difficulty,  divergence  can  still  occur  for  other  reasons. 

Divergence  in  extended  Kalman  filters  occurs  when  error  sources 

j 

that  are  unaccounted  for  do,  in  reality,  exist  so  that  the  error 
covariance  matrix  elements  take  on  unrealistically  low  values.  One 
very  simple  modification  of  the  filter  equations,  described  below, 
has  been  used  with  same  success  in  practice  to  overcome  this  problem. 

A  first  order  approximation  of  the  effect  of  uncertainties  in 
the  parameter  estimates  upon  the  measurement  equation  is: 


*i 


hi<*k> 


.v. 


(13) 


33 


This  is  the  measurement  model  used  in  both  Gauss-Newton 
optimization  and  the  extended  Kalman  filter  where 

*k+l  *  *k  “ 


(14) 


and  one  can  linearly  solve  for  dx^  and  hence  2^+1  (3)  • 

However,  proceeding  with  the  philosophy  used  in  Tenney  [2] , 
dx^  can  be  considered  to  be  a  zero  mean  Gaussian  random  vector  whose 
covariance  matrix  is  the  already  available  parameter  estimated  co- 
variance  matrix,  P^.  Assuming  that  the  ficticious  measurement  noise 
is  independent  of  the  actual  measurement  noise,  the  new  measurement 
-noise  covariance  matrix  is* 


+  si4>4SiT(4) 


(15) 


^ut  the  second  term  in  the  sum  is  already  available  in  the  Kalman 
denominator  so  that  one  has: 


-1 


4.1  «I^T<*k>l2Si<ak>*kBiT<4k>  ♦li)~  Cl£> 

This  modification  can  be  generalized.  Conaider  the  gain  denominator 
to  consist  of  two  measures  of  uncertainty,  one  due  to  the  measure¬ 
ment  noise  and  one  due  to  the  inaccuracy  in  the  current  parameter 
estimates.  The  nominal  weighting  of  theae  two  quantities  is  1:1. 

A  weighting  of  2:1  as  indicated  in  (16)  corresponda  to  the  atatiatical 
model  of  (13).  But  certainly  other  weighta  are  poasible.  That 
is,  we  might  use 

r»^/d  t  r.  n ■  /A  *  ■  —  •  “1 


4.i  •  ♦-4’' 


(17) 


o 


» 


34 


where  a^  is  suitably  chosen. 

As  a.  is  nade  greater  than  one,  the  rate  of  filter  convergence 
decreases  as  the  filter  behaves  sore  cautiously,  believing  an 
increasing  amount  of  uncertainty  is  present  in  the  measurements. 
Specifically,  when  the  RSKF'a  estimates  are  far  from  the  optimal 
estimates,  CS^)  should  be  much  greater  than  so  that 

the  magnitude  of  the  correction  vector  is  l/aK  of  what  it  would  be 
had  a^=l. 


Examples  of  the  filters'  performance  using  different  values  of 
a^  are  presented  in  Chapter  4. 

Although  a  weight  of  8^*2  corresponds  to  the  statistical  model 
of  (13),  one  should  not  conclude  that  this  is  the  best  possible  a^ 
setting.  Rather  the  model  of  (13)  is  a  heuristic  device  for  adding 
an  (not  necessarily  minimal  or  even  sufficient)  amount  of  ficticious 
measurement  noise  in  order  to  prevent  divergence. 

The  statistically  unsatisfactory  nature  of  (13)  is  two  fold. 
First,  even  though  the  ficticious  measurement  noise  random  variables, 
£2^  and  the  parameter  uncertainty,  are  the  same  random 
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variables,  in  going  tram  (15)  to  (16)  we  implicitly  assume  that 
they  are  uncorrelated.  Secondly,  the  effect  of  parameter  uncertain¬ 
ties  in  the  partial  derivative  evaluations  has  not  been  included. 

The  former  point  can  be  demonstrated  explicitly  using 
Schmidt's  [16]  model  for  including  the  effect  of  parameter  uncertain 
ties  in  the  measurement  equation: 


Y 

y 


(x,v,t) 

M 

—  1  v  - 


+  q(t) 
8M 


♦  q  (t) 


y  -  H'  (t)*  +  6*  (t)v  *  q  (t ) 

E(y)«  £  £(q)  ■  0 

E(qqt)»Q 

C  -  E  [  )  P  -  E  [  (*-&)  (x-x) t] 


Here  y,x  and  v  are  the  deviations  from  the  nominal  estimates 
of  Y,X  and  v.  These  are  the  measurements,  states  (parameters)  and 
uncertain  parameters,  respectively. 

The  resulting  gain  and  covariance  updating  expressions  are: 

K  *  (P  H'T+  C  G,T)  (H’P  H,T+  H'C  G'T+  G'C  H'T+  G'W  <?T+  RJ_1 
P  *  [I-K  h']P  -  K  G'CT 

These  reduce  to  the  expressions  for  the  gain  in  (16)  and  the 
covariance  in  (8)  if  C  -  0,  w  -  £  and  «'■£' .  These  are  the 
assumptions  under  which  Schmidt's  model  is  equivalent  to  (13). 

We  also  mentioned  that  the  effect  of  parameter  uncertainties 


in  the  partial  derivative  evaluations  haa  not  been  included. 
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Accounting  for  it  using  a  measurement  noise  formulation  is  not 
directly  possible.  If  one  expands  the  partial  derivatives  in  the 
linearized  measurement  equation  (13)  in  a  Taylor  aeries  about  the 
current  estimate,  the  first  order  terms  in  the  series  (which  are 
second  order  partial  derivatives)  are  multiplied  by  dx^.  However, 
as  discussed  in  sec.  3.2,  the  extended  Kalman  filter  implicitly 
assumes  this  term  to  he  zero:  To  be  more  specific,  it  assumes 
£x(k+lA),  that  is  £x  at  k+1  given  k  measurements,  to  be  zero  and 
linearly  solves  for  dx  Oc+l/k+l) . 

Considering  dx^  to  be  a  random  variable,  as  in  sec.  3.4,  be¬ 
comes  statistically  involved.  The  following  quadratic  term  must 
then  be  added  to  the  linearized  measurement  equation 

T 

where  dx  comes  from  the  partial  derivative  expansion  and  dx^  from 
the  measurement  function  expansion. 

Whether  one  considers  these  to  be  the  same  random  vector 
(h(0,P^))or  uncorrelated  random  vectors  with  identical  distributions, 
utilizing  the  resulting  non-Gaussian  distribution  is  a  problem. 

3.5  The  Matrix 

In  the  previous  discussion  of  the  RSKF  little  hss  been  said 
about  the  matrix  JP^  except  that  it  favorably  weights  correction  vec¬ 
tor  components  towards  the  region  of  greatest  uncertainty.  Zt  has 
no  strict  statistical  meaning  in  the  RSKF.  Zt  arises  naturally  in 
linear  Kalman  filtering  aince  it  and  ^  describe  the  statistical 
distribution  of  the  currant  parameter  estimate  as  being  £(^*2^)* 
When  the  system  and  obser  ation  noise  is  additive  and  Gaussian. 


For  nonlinear  filtering  using  an  excellent  reference  trajectory 
one  night  hope  that  is  still  neaningful.  The  of ten-en counted 
Kalman  filter  divergence  indicates  that  this  is  not  necessarily  so. 
One  is  attempting  to  represent  the  non-Gaussian  conditional  distri¬ 
bution  function  (the  estimate  distribution  conditioned  on  the  data 
received  up  till  the  present  iteration)  with  only  two  moments.  In 
the  problem  described  by  (1),  a  good  reference  trajectory  is  not 
assumed  to  be  available.  This  makes  matters  worse.  One's  only 
hope  is  that  after  an  initial  rapid  convergence  to  a  neighborhood 
near  the  optimum,  the  conditional  density  is  asymptotically  Gaussian 
through  a  central  limit  type  operation.  The  empirical  success  of 
the  'ficticious'  measurement  noise  suggests  that  considering  dx^  as 
as  N(0,P^)  may  indeed  be  meaningful.  The  random  sampling  will  also 
tend  to  eliminate  correlations  in  the  error (due  to  inaccuracy  in  the 
current  estimate)between  samples  that  are  adjacent  in  real  time. 
However,  this  should  be  considered  as  vague  speculation.  At  this 
time  we  can  say  nothing  definite. 

3.6  Computational  Considerations 
Why  would  one  employ  an  estimator  such  as  the  RSKF  instead  of 
one  of  the  well  known  iterative  minimization  techniques?  One  rea¬ 
son  might  be  computational. 
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This  involves  the  difference  between  e  RSKF  "iteration"  and  an 
iterative  minimization  technique  "iteration".  During  the  former 
only  a  single  pointwise  error  evaluation  and  partial  derivative 
matrix  calculation  is  made.  In  the  latter  type  of  "iteration" , 
such  quantities  are  calculated  for  all  of  the  available  data. 

Even  if  the  latter  algorithm  converges  in  many  fewer  of  its 
"iterations",  each  one  involves  many  more  nonlinear  function  evalua- 
tions.  This  computational  cost  could  be  significant. 

Furthermore,  the  RSKF  may  be  stopped  when  sufficiently  accurate 
estimates  are  obtained.  Although  the  waveform  measurements  need  be 

stored  off-line,  not  all  of  them  need  be  processed. 

To  examine  the  question  of  computation  time  in  some  detail, 

let : 

N  =  the  number  of  measurements. 

IMT  =  iterative  technique  (I.M.T.)  of  the  form  (8). 

D  =  the  number  of  computational  units  necessary  to  evaluate 
A  T  A 

hi(^c}{)  and  (x^).  Henceforth,  a  computational  unit  is 
assumed  to  be  a  single  multiplication. 

M1MT'MRSKF  *  the  numt>er  of  computational  units  necessary  to  perform 
one  iteration  of  an  IMT  or  RSKF,  respectively,  dis¬ 
regarding  Mpy£  and  logic  costs  (program  loops,  etc...). 

ZIMT'IRSKF  *  the  number  of  iterations  for  an  IMT  or  RSKF,  respec¬ 
tively,  to  sufficiently  converge. 

^IMT'^RSKF  *  the  time  necessary  for  each  of  the  two  techniques,  to 
converge,  expressed  in  computational  units. 

It  will  be  assumed  that  it  ia  sufficient  to  compare  only  the 

number  of  multiplications  and  function/derivative  evaluations.  In 

a  first  analysis  logic  costs  will  also  be  neglected,  tfe  have: 


(18) 


tIMT  *  IIMT^N*MF/D  +  ^IMT* 
tRSKF  "  IRSKF*  ^P/D  +  **RSKF^ 

How  large  muat  IJMT  be  before  the  RSKF  becomes  computationally 
more  economical?  This  can  be  found  by  letting  t  ■  *rskf'  *or 


(MF/D  *  **R5KF^  . 

(H.Mp^  ♦  MjMT)  RSKF 


(19) 


The  simplest  iterative  technique  of  the  form  (8),  is  steepest 
descent  (♦  (x^_  )«=!)♦  Here 


n*mf/d  >:>  mimt 


(20) 


and  we  will  assume 


MRSKF  >>  HF/D 


RSKF 


N 


Then  (19)  reduces  to 


(21) 


M 

RSKF 
MF  A> 


(22) 


The  inclusion  of  logic  costs  will  increase  and  though 

the  increase  in  the  latter  will  be  most  significant  both  because  of 
the  more  complex  computational  structure  of  the  RSKF  and  because 
Mggjgp.  is  a  larger  fraction  of  (Mjyu  4  Mrskf^  th#n  ^jhT  of 

,  +  Mimt)  *  assuming  that  the  inequalities  of  (20)  and  (21) 
are  true.  In  Chapter  4  the  inclusion  of  logic  costs  in  a  5  parameter 
problem  will  be  seen  to  increase  by  •  factor  of  three. 
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I  Asymptotically ,  as  N  and/or  increases,  the  RSKF  is  favored. 

The  asymptotic  effect  of  an  increase  in  the  number  of  parameters  is 
■ore  involved,  not  only  because  the  convergence  behavior  may  be  af- 
*  fected,  but  alao  because  the  increase  in  and  MjMT  will  be 

favorable  to  the  RSKF  while  that  in  will  not  be. 

There  are  a  number  of  other  computationally  related  issues  that 
>  involve  the  rate  of  convergence  and  accuracy  of  the  estimate  one 
obtains . 

Numerically,  the  classical  Kalman  filter  equations  are  known 
to  be  unreliable  [4] .  Our  simulations  have  been  obtained  using 
double  precision  on  an  IBM  370  machine.  There  is  no  conceptual  dif¬ 
ficulty  with  utilizing  the  more  recently  developed  Kalman  filter 
formulations  with  improved  numerical  properties  [4]  though  addi¬ 
tional  computation  is  involved. 

Specifically,  the  term  that  is  inverted  in  (17)  does  not  appear 
in  the  inverse  form  of  the  Kalman  filter  equations,  the  square  root 
ariance  filter  or  the  Bquare  root  information  filter.  Because  of 
this  the  use  of  the  scalar  a^,  as  in  (17),  is  not  possible.  How¬ 
ever  P  can  be  replaced  during  each  iteration  with  the  increased  Rnew 
of  (15).  The  evaluation  of  Rnewrequires  about  multiplications, 
where  n  is  the  number  of  parameters. 

Since  a  random  sequence  determines  the  processing  order,  one 
alight  ask  if  there  is  an  optimal  order.  Could  the  randomly  selected 
samples  be  concentrated  in  certain  sections  of  the  waveform  that 
contained  the  most  information  about  the  parameters  so  as  to  achieve 
a  faster  convergence?  Alternately,  given  the  waveform,  what  are 
the  sampling  locations  which  provide  minimum  variance  estimates? 
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Unfortunately,  expression*  for  the  trace  of  the  error  covariance 
matrix,  while  independent  of  the  parameters  for  a  linear  estimation 
problem,  do  indeed  depend  on  the  parameters  to  be  estimated  for  non¬ 
linear  models. 

Therefore,  this  approach  could  only  be  viable  if  very  good  a 
priori  estimates  of  the  nonlinear  parameters  are  already  available. 

It  could  be  employed  once  the  filter  had  sufficiently  converged, 
but  there  is  a  real  question  concerning  the  cost  of  the  extra  com¬ 
putation  needed  to  minimize  the  trace  expression. 

Another  point  is  that  if  it  is  possible  to  process  only  one 
measurement  per  iteration,  it  is  also  possible  to  process  several 
simultaneously  during  each  iteration.  This  could  reduce  the  number 
of  iterations  required  for  convergence  but  the  additional  measure¬ 
ment  equations  increase  the  Kalman  filter's  computational  complexity. 

Conceivably  one  could  also  process  measurements  that  had  already 
been  used.  In  the  simulations  presented  in  the  following  chapter 
the  RSKF  made  only  a  single  pass  through  the  data.  While  this 
seemed  adequate,  additional  passes  through  the  data  could  have  been 
made  in  order  to  refine  the  estimate. 
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Chapter  4 

•  An  Application 

4.1  Introduction 


Sensors  are  available  that  are  capable  of  measuring  a  ship's 
magnetic  field  intensity.  Sunahara  [5]  has  considered  sensors 
placed  under  a  harbor  entrance  for  traffic  control  purposes.  Air¬ 
craft.  engaged  in  anti-submarine  activity,  may  also  carry  such 
sensors.  In  this  case,  the  submarine  is  assumed  stationary  and  the 
sensor  moves  in  a  straight  line  relative  to  it. 

The  magnetic  field  of  the  ship  can  be  modeled  as  originating 
from  a  magnetic  dipole  [6] .  The  field  intensity  measured  by  the  sensor 
can  in  turn  be  approximated  by  some  "suitable"  deterministic  func¬ 
tion.  Sunahara,  who  was  interested  primarily  in  detecting  this 
"signal"  in  the  presence  of  substanial  noise,  modeled  the  field 
intensity  waveform  as  a  single  cycle  of  a  sine  function  of  unknown 
.mplitude  and  phase.  Another  representation  for  the  field 
intensity,  in  a  simplified  form  is: 


h(lk)  -  (l^rMte1*e2Ve,lkJl 

•k  *  S(VV 


(23) 


This  is  somewhat  representative  of  other  tracking  models  in  that 
the  location  parameter,  TQ,  represents  the  time  at  which  the 
closest  point  of  approach  (CPA)  between  ship  and  sensor  is 
made  and  the  scale  parameter,  S,  is  proportional  to  the  velocity 
of  the  moving  object  and  inversely  proportional  to  the  distance 
between  ship  and  sensor  at  CPA. 
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One  artifice  employed  to  insure  a  satisfactory  filter  per for- 

a 

mance  was  to  replace  S  with  e  and  estimate  p.  This  replaces  the 
physically  constrained  S(S>0)  with  the  unconstrained  P  (otherwise, 
the  filter  may  converge  to  a  local  minimum  with  a  negative  scale 
estimate).  The  variance  in  (25)  is  that  of  0 . 

4.2  Computational  Comparison 

Based  on  expressions  for  the  computational  requirements  of 
Kalman  filters  appearing  in  Mendel's  work  [7],  one  finds  that  for 
a  five  parameter,  single  measurement  equation  problem,  approximately 
600  computational  units  are  necessary  for  multiplication  and  1500 
if  logic  cos ts  are  included.  (A  logic  operation  is  assumed  to  be 
ten  times  faster  than  a  multiplication.)  The  minimum  number  of  com¬ 
putational  units  necessary  to  evaluate  (23)  and  the  associated  par¬ 
tial  derivatives  is  about  64  (see  Appendix  C). 

Assuming  that  Mp^  »  Mj^  and  IIMT  ■  N  ■  400  the  number  of 
IMT  iterations  at  which  the  RSKF  becomes  computationally  competitive 
is : 

Mr/t.  *  Vkf  .  MiM  .  25  (24) 

MF/t>  64 

if  logic  costs  are  included. 

4.3  An  Example 

Several  simulations  were  performed  using  an  IBM  370  with  double 
precision.  The  classical  extended  Kalman  filter  equations  were 
used  with  the  parameters: 
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4.3.1  The  Weight  a^  ■  2.0 

Figs.  14-20  illustrate  the  operation  of  the  RSKF  with  the  weight 
in  the  Kalman  gain  denominator  set  equal  to  two. 

Fig.  14  illustrates  the  received  waveform  and  the  waveform  cor¬ 
responding  to  the  initial  parameter  estimates. 

Fig.  15  is  a  plot  of  a  measure  of  the  error  of  the  parameter 
estimates.  Specifically,  the  following  quantity  1b  plotted  for  each 
iteration. 


js-li 
c3  S 


,  M.i, 

T  *r  j 
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Fig.  16  is  a  plot  in  the  nonlinear  state  apace  of  the  conver¬ 
gence  of  S  and  $q.  It  can  be  seen  that  the  correction  vectors  gen¬ 
erally  decrease  in  magnitude  as  the  optimum  is  approached.  Further¬ 
more.  although  the  general  trend  of  the  trajectory  is  to  approach 
the  optimal  parameter  estimate,  it  can  be  seen  that  individual  cor¬ 
rection  vectors  often  do  not  point  toward  it.  This  leads  to  a  number 
of  "knots"  and  "loops"  in  the  trajectory. 

Fig.  17  ia  a  plot  of  the  normalised  covariance  trace: 
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.  Pv(l,l)  P.  (2 , 2 )  Pv(3,3)  Pv  (4,4)  Pv  (5,  5) 

♦  5[p^(T73T  *  V272T  +  P0  (3,3)  +  p^TTTST  +  p^TsTsT1  (27) 

Fig.  18  displays  the  filter  innovations  (y^-1^  (^) ) .  They 
seem  to  settle  out  at  about  100-150  iterations.  This  is  consistent 
with  Fig.  17. 

Fig.  19  displays  the  Kalman  gain  denominator.  Since  the  mea¬ 
surement  noise  covariance  component  of  the  denominator  is  at  about 
zero  on  the  scale  of  the  graph,  one  is  seeing  only  the  other  com¬ 
ponent,  the  uncertainty  due  to  the  uncertainty  in  the  curren  para¬ 
meter  estimate.  This  term  appears  to  have  a  'spiky*  nature.  It 
generally  decreases  in  amplitude. 


Finally,  Fig.  20  is  the  received  waveform  along  with  the  wave¬ 
form  corresponding  to  the  final  parameter  estimates: 


mean 

variance 
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.23* 10~3 

(28) 


These  may  be  compared  to  the  actual  parameter  values  in  (25). 

The  next  eight  plots  (Figs.  21-28)  illustrate  the  population  of 
correction  vectors  that  the  RSKF  randomly  samples  from.  Each  plot 
corresponds  to  a  particular  iteration  of  the  previous  example 
(■^*2.0).  The  common  point  ahared  by  all  the  vectors  is  the  current 
estimate  in  the  nonlinear  state  space.  Each  vector  corresponds  to 
the  particular  correction  vector  that  would  result  if  one  of  the 
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400  measurements  were  processed  during  the  iteration. 

Each  plot  shows  400  possible  correction  vectors.  However,  this 
is  merely  for  illustrative  purposes.  In  actual  operation  the  RSKF 
samples  the  measurements  without  replacement.  This  means  that  the 
size  of  the  sample  population  decreases  by  one  after  each  iteration. 

The  most  important  feature  in  these  plots  is  that  the  possible 
correction  vectors  generally  do  not  point  in  the  same  direction. 

Also,  as  the  processing  continues,  the  fluctuation  in  the  mag¬ 
nitudes  oi  .  Jacent  correction  vectors  tends  to  increase.  This  is 
due  to  the  increased  effect  of  the  measurement  noise  on  the  correc- 
tion  vector  magnitude  (and  direction  if  there  is  a  sign  change). 

4.3.2  Suboptimal  Weights 

The  next  several  plots  will  illustrate  the  effect  of  the  use 
of  different  weights  in  the  Kalman  gain  denominator. 

Figs.  29  and  30  are  plots  of  the  normalized  covariance  trace 
when  a^*4.0  and  a^-8.0,  respectively.  These  may  be  compared  to 
Fig.  17  where  a^*2.0.  The  slower  convergence  that  results  from  in¬ 
creasing  the  weight  beyond  2:1  is  evident  in  the  longer  time  it  takes 
for  the  trace  to  converge  to  zero. 

The  filter's  performance  with  8^*1. 0  is  illustrated  in  Figs. 

31,  32  and  33.  Fig.  31  shows  that  a  reasonable  fit  has  not  been 
achieved.  In  Fig.  32  it  can  be  seen  that  the  trajectory  of  the  non¬ 
linear  parameter  estimates  has  failed  to  correctly  converge.  As 
previously  mentioned,  the  problem  is  that  of  nonlinear  filter  diver¬ 
gence.  The  filter  takes  an  overly  optimistic  view  of  the  accuracy 
its  estimates.  The  very  rapid  decrease  in  the  normalized  covariance 
trace  of  Fig.  33  in  comparison  to  that  of  Fig.  17  confirms  this. 
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Chapter  5 
Conclusion 

An  application  of  off-line  processing  by  means  of  a  sequential 
estimator  has  been  presented.  The  technique  is  essentially 
a  sequential  version  of  Gauss-Newton  optimization  with  ficticious 
measurement  noise  added  to  prevent  filter  divergence. 

The  RSKF  described  does  not  implicitly  make  use  of  the  form  of 
the  model  (1),  its  separability  and  most  of  the  associated  assump¬ 
tions  .  Therefore,  it  would  appear  to  be  possible  to  use  if  for 
other  non-linear  square  error  regression  problems  and,  in  particular, 
for  curve  fitting.  Such  a  method  would  seem  to  be  most  appropriate 
in  situations  requiring  fast  off-line  processing. 

However,  computational  savings  if  they  are  possible  at  all, 
will  depend  on  the  complexity  of  the  model  being  employed.  Also, 
the  extent  to  which  convergence  can  be  guaranteed  has  not  been  suf¬ 
ficiently  delineated.  Furthermore,  the  numerical  properties  of  the 
filter  equations  are  crucial  to  the  feasibility  at  their  implemen¬ 
tation  in  a  limited  precision  computer. 
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Appendix  A 


At  first  glance  it  nay  appear  that  the  well  known  optimization 
maxim,  that  one's  estimates  should  proceed  in  a  direction  of  non¬ 
increasing  error,  is  being  violated.  The  confusion  apparently  arises 
because  a  distinction  must  be  made  between  the  error  surface  based 
on  all  the  data  that  will  eventually  be  received  and  the  error  sur¬ 
face  based  on  only  the  data  received  up  to  the  point  of  the  current 
iteration.  There  is  no  reason  to  think  that  ones'  estimate  should 
always  be  moving  toward  an  optimal  point  on  an  error  surface  that 
depends  on  data  not  yet  received. 

To  illustrate  this,  consider  the  sequential  estimation  of  the 
stationary  mean  of  a  sequence  of  N  i.i.d.  Gaussian  random  variables. 


*k+i  *  ^  *k  *  x  yk  (1) 

xk+l  "  ^k  +  k^V^k' 


At  each  iteration  is  clearly  the  optimal  (say  square  er¬ 

ror)  estimate  on  the  error  surface  of  data  received  so  far.  However, 
as  converges  toward  the  actual  mean,  (2)  is  adding  what  is  almost 


a  zero  mean  (actually  - - )  Gaussian  random  variable  to  the 

current  estimate  in  order  to  produce  the  updated  estimate. 

Therefore,  with  respect  to  the  error  surface  of  all  the  data 
to  eventually  be  received,  the  estimate  moves  awaysfrom  the  op¬ 
timum  point  of  that  surface  a  percentage  of  times  that  asymptoti¬ 
cally  approaches  50%.  Put  another  way,  the  current  estimate  moves 
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Appendix  B 


The  rank  condition  can  be  violated  in  models  such  as 


B.t.  B  t.  B  t. 

y(tx)  -  cxe  1  x  +  c2e  2  *  +...+cBe  m  K 

The  problem  is  that  columns  in  the  matrix  of  basis  functions 

may  become  linearly  dependent,  reducing  the  matrix  rank.  In  the 

A  A 

above  model  this  will  occur  if  B^*B.. : 
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In  the  Anderson  model  the  basis  functions  are  linearly  indepen- 

A  A 

dent  for  any  S  and  TQ,  and  so,  analytically,  present  no  problem. 

Under  certain  circumstances  though,  columns  could  'numerically' 
resemble  each  other.  For  instance  if  the  location  parameter  esti¬ 
mate  and/or  scale  estimate  are  quite  far  off  the  columns  in  the 
matrix  of  basis  functions  will  assume  values  close  to  zero. 
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Appendix  C 

The  number  of  computational  unite  (multiplications)  necessary 
to  evaluate  the  model  (23) ,  and  associated  partial  derivatives,  de¬ 
pends  on  how  much  extra  programming  complexity  one  is  willing  to 
accept . 

To  establish  a  lower  bound,  one  should  note  that  the  function 
evaluation  (and  thus  linear  parameter  partial  derivatives)  requires 
at  least  six  multiplications.  The  evaluation  of  the  nonlinear  para 
meter  partial  derivatives  each  require  about  18  multiplications. 

In  addition,  a  quantity  must  be  raised  to  the  2.5  power.  This 
is  equivalent  to  one  multiplication,  one  logarithm  and  one  expo¬ 
nentiation.  Nonlinear  function  evaluations  are  approximately 
equivalent  to  eight  to  twenty  multiplications,  depending  on  the 
accuracy  desired  [9] ,  [10] . 

Assuming  that  ten  multiplications  per  nonlinear  evaluation  is 
a  good  estimate,  the  time  required  to  evaluate  the  function  in  ques 
tion  and  its  associated  partial  derivatives  is  equivalent  to  per¬ 
forming  64  or  more  multiplications. 
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